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Fast  Algorithms  for  Estimating  Mixture  Parameters’ 


A.  Problem: 

The  project  consists  of  investigating  three  numerical  methods  estimating 
the  parameters  of  a  mixture  distribution.  The  estimates  for  the  parameter  0 
are  obtained  by  maximizing  the  log-likelihood  function 

L{6)  =  52p(*]fc|0)  =  YiVk 

k  k 

where  {x'i, .  . . , xn}  is  a  sample  of  size  n  from  a  mixture  distribution. 

1.  Accelerated  Scoring:  The  first  approach  depends  on  the  fact  that  the 
log-likelihood  function  for  a  mixture  distribution  has  the  form 

k 

The  first  term  of  L"  involves  only  first  derivatives  and  is  easy  to  com¬ 
pute  relative  to  the  second  term.  Therefore,  the  bulk  of  the  computa¬ 
tion  in  using  Newton’s  method  to  estimate  0  would  involve  obtaining 
the  second  derivative  term.  This  phase  of  the  project  involved  replacing 
the  second  derivative  term  with  various  quasi-Newton  approximations. 

2.  Accelerated  Fixed-point:  The  EM  algorithm  is  often  used  for  obtaining 
parameter  estimates  for  mixtures.  It  can  be  viewed  as  a  procedure  for 
finding  a  solution  to  an  equation  of  the  form  6  =  G(9).  Using  Newton's 
method  to  solve  this  problem  requires  the  computation  of  I  —  G'(9). 
This  phase  of  the  project  investigates  replacing  the  derivative  by  a 
quasi-Newton  approximation  obtained  by  computing  updating  vectors 
based  on  several  EM  steps  and  applying  a  quasi-Newton  update  to 
accelerate  the  convergence. 

3.  EM  Relaxation:  This  phase  of  the  project  is  investigating  the  effect  of 
a  relaxation  of  the  form 

9n+ i  =  YZT\^N  ~  T^JG{6n) 

'The  view,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  authors 
and  should  not  be  construed  as  an  official  Department  of  the  Anny  position,  policy,  or 
<■’  Lision  unless  so  designated  by  other  documentation. 
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where 

^  _  (On  —  fl,v-i)7(fl;v-i  -  On- 2) 

|^v-i  -  On- 2I2 

The  parameter  A  is  an  approximation  to  the  largest  eigenvalue  of  G' 
and  so  this  approach  is  a  generalization  of  the  one-dimensional  Aitken 
acceleration. 


B.  Results: 

The  investigation  is  a  two  year  project  with  the  first  year  sponsored  by 
tire  Army  Research  Office  and  the  second  yrear  by  the  National  Science  Foun¬ 
dation  (Grant  Number:  DMS-0088995).  The  results  reported  are  those  ob¬ 
tained  under  the  sponsorship  of  the  Army  Research  Office  during  the  period 
July  1.  1988  to  June  30,  1989.  During  this  period  the  work  on  the  accel¬ 
erated  scoring  technique  was  completed  and  prepared  for  publication,  and 
the  numerical  testing  of  the  accelerated  fixed-point  method  was  completed. 
The  work  on  relaxation  methods  will  be  done  under  the  sponsorship  of  the 
National  Science  Foundation  during  the  coming  year. 


Accelerated  Scoring:  Newton-like  iterative  procedures  for  maximizing  L  have 
the  form 


On+i  —  On  —  BNlV  sL{6n) 


where  the  method  is  determined  by  the  of  Bn-  Newton’s  method, 

itself,  uses  Bn  =  L"{0n)-  As  was  mentioned  ..-hove  L"(0)  =  C(0)  +  A(0), 
where  C  =  -(Ei  v ePk/Pk)(T,k^ oPk/pk)T  an  1  =  YikPk/Pk-  The  Method 
of  Scoring  (MOS)  uses  updates  with  Bn  =  C(0n),  which  simply  ignores 
the  second  derivative  term  under  the  supposition  that  in  general  its  effect  is 
small. 

We  have  developed  a  method,  called  MLE  updating,  which  does  not  ig¬ 
nore  the  second  derivative  but  uses  C(0t\)  together  with  a  quasi-Newton 
approximation  to  A(O^).  The  update  is  given  by 


where 


An+\  =  An  + 


WnSn  -f  SnWn 
vJfSN 


sJt'vn 

( VnSn )2 


VnVn 


sn  =  &n+ 1  —  On 

-  T7.  r (a..  _  vy,  r<a..\ 

WN  =  ^(Vsp(xfc|0/v  +  i)  -  S7ep(xk\0N))/p(xk\(>N+\)  -  Ansn- 

k 
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Theoretical  Results 


THEOREM:  If  0  is  a  point  for  which  VgL{6)  =  0,  VqL  is  differentiable  on 
an  open  convex  neighborhood  fi  of  ft,  C  is  coiitinuous  at  6,  there  are  7  >  0 
and  p  6  (0,  1]  such  that 

| L"{0)  -  L"{6)\  <  7 \0  -  Of 

for  all  t)  e  D,  L'(0)  is  invertible,  there  is  a  7c  >  0  so  that 

\C{0)-C\0)\<lc\0-0f, 

then  there  are  e  >  0  and  5  >  0  such  that  if\0c  —  0\  <  e  and  1 7l0  —  A(0)\  <  8, 
tic  MLE  updates  are  well-defined  and  converge  q-superlinearlg  to  0.  Further¬ 
more,  { | } a'=o,i,...  an(l  { I | } /v=o,i ....  are  uniformly  bounded. 

This  theorem  shows  that  the  convergence  properties  of  the  MLE  updating 
method  are  comparable  to  the  well-known  least  squares  methods. 

Numerical  Experiments 

Numerical  experiments  where  conducted  with  samples  from  mixtures  of 
univariate  normals.  The  parameters  used  are  detailed  in  Gonglewski  and 
Walker  (see  C.  below),  but  here  we  summarize  in  Table  1.  representative 
results  obtained  using  five  methods 

1.  MLE  updating  (MLE) 

2.  Finite  Difference  Hessian  (FDH) 

3.  Method  of  Scoring  (MOS) 

4.  Broyden-Fletcher-Goldfarb-Shanno  Updating  (BFGS) 

5.  EM  Algorithm  (EM) 

for  samples  with  various  distances  between  population  means,  A p. 

This  testing  shows  that  MLE  updating  compares  very  favorably  with 
standard  general  optimization  methods.  Both  th^  MLE  and  standard  gen¬ 
eral  optimization  methods  seem  likely  to  outperform  the  EM  algorithm  on 
mixture  estimation  problems,  especially  when  the  component  populations 
are  poorly  separated. 
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Table  1. 


Ap 

MLE 

FDH 

MOS 

BEGS 

EM 

6.0 

13 

11 

14 

24 

5 

4.0 

13 

7 

13 

22 

44 

2.0 

10 

8 

15 

26 

SS3 

1.0 

22 

23 

27 

46 

777 

0.4 

42 

37 

38 

76 

1381 

0.2 

54 

51 

59 

87 

3095 

Iterations  required  to  obtain  \6,\r  —  0],^  <.  10  ° 


Accelerated  Fixed-point:  A  Newton  acceleration  of  the  EM  algorithm  could 
be  accomplished  by  periodically  inserting  between  EM  iterations,  0,\+i  = 
G '{9 s’),  an  update  of  the  form 


eN+1  =6N-F'(0v)-lF(9N) 

where  F(&)  =  0  —  0(0).  Our  quasi-Newton  method  replaces  the  inverse  of 
the  derivative  by  a  Broy den-update  approximation 

Btf+i  —  (I  +  vn$Jj)Bn1 
where  s,v  =  G($s)  —  &n  and 

_  sn-BJ(F(G(0n))-F(0n)) 
sJ,Bp(F(G(eK))-F($N))  ' 

This  approach  does  not  require  the  computation  of  a  derivative  and  only  the 
vectors  .s;v  and  iqv  need  be  stored. 

The  graduate  students  Wick  and  Shea,  under  the  direction  of  Walker 
and  Windham,  have  constructed  an  experimental  code  which,  according  to 
options  selected  by  the  user,  generates  samples  on  mixtures  of  univariate  or 
multivariate  normally  distributed  random  variables  and  determines  approx¬ 
imate  maximum-likelihood  estimates  using  either  the  unmodified  EM  algo¬ 
rithm  or  the  EM  algorithm  accelerated  via  Broyden  updating.  Preliminary 
computational  results  look  promising,  but  a  good  bit  of  additional  coding 
and  numeric^'  experimentation  needs  to  be  done  to  bring  this  phase  of  the 
research  to  completion.  The  preliminary  results  show  that  one  can  expect 
that  the  number  of  iterations  required  for  well-separated  populations  is  the 
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same  with  or  without  updating,  but  when  the  populations  are  poorly  sepa¬ 
rated  updating  reduces  the  computation  by  fifty  percent  in  general  and  as 
much  as  seventy  percent. 

The  work  on  the  EM  algorithm  for  mixtures  has  also  led  to  a  very  promis¬ 
ing  line  of  investigation  bearing  on  perhaps  the  most  difficult  problem  associ¬ 
ated  with  mixture  estimation,  determining  the  number  of  component  popula¬ 
tions  in  a  mixture.  This  investigation  centers  around  a  certain  ‘'information 
ratio"  matrix  involving  the  Fisher  information  matrices  of  the  mixture  data 
and  of  the  "labeled"  data  given  the  mixture  da  •  .  Windham,  with  the  collab 
oration  of  Walker  and  H.-H.  Bock  of  the  Rheinisch-Westfallische  Technische 
Hochschule,  Aachen,  FRG,  has  shown  that  the  eigenvalues  of  this  ‘‘infor¬ 
mation  ratio”  matrix,  especially  the  extreme  eigenvalues,  are  closely  related 
both  to  the  number  of  component  populations  in  a  mixture  and  the  speed  of 
convergence  of  the  EM  algorithm.  Thus  the  observed  speed  of  convergence 
of  the  EM  algorithm  may  be  used  to  evaluate  the  validity  of  assumptions 
about  the  number  of  component  populations  in  the  mixture. 
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